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ABSTRACT 

A Rowing controversy surrounds the strict 
interpretation of Statistical significance tests in social research. 
Statistical significance tests fail in particular to provide 
estimates for the Stability of research results. Methods that do 
provide such estimates are known as invariance or cross-validation 
procedures. Invar^nce analysis is largely an untested science which 
is applied to determine how stable the statistical results are likely 
to be across different samples. It can be employed with any 
parametric procedure. The details of invariance analysis vary 
according to the analytic technique employed. Cross-validation 
procedures appropriate for multiple regression and its multivariate 
extension, canonical Correlation analysis, ar e discussed in this 
paper, and a concrete example is presented. (Author/JAZ) 
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ABSTRACT 



A growing controversy surrounds the strict interpretation of 
statistical significance tests in social research. Statistical 
significance tests fail in particular to provide estimates for 
the stability of research results-. Methods that do provide such 
estimates are known as invariance or cross-validation procedures, 
and they can be applied in most analyses. Cross-validation 
procedures appropriate for multiple regression and its 
multivariate extension, canonical correlation analysis, are 
discussed in this paper, and a concrete example is presented. 
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If we take in our hands any volume of school 
metaphysics, for instance, let us ask, ''Does 
it contain any abstract reasoning concerning 
quantity or number?" No. "Does it contain any 
experimental reasoning concerning matter of 
fact and existence?" No. Commit it then to 
the flames, for it can contain nothing but 
sophistry and illusion. 

David Hume (quoted in Will 

Durant, The S tory of 
Philosophy. ) 

Statistical significance testing is the "experimental 
reasoning" of choice among most researchers today, and while its 
absence in an empirical study may no longer be cause for 
commitment to flames, it may result in notices of rejection from 
publishers or from dissertation committees. Nearly 30 years ago, 
however, Selvin (1957) publicly questioned the value of 
statistical significance testing as an inferential tool in social 
research. Selvin's article initiated a controversy which 
continues to this day, with increasingly formidable artillery 
ranged on the side of significance testing's opponents. 

The philosophy of statistical significance testing assumes 
an abstr^rt simplification of the reality in which social 
scientists are interested. In a universe of human behavior 
shaped by complex relationships among large numbers of variables, 
the statistical significance test can only provide a binary 
solution — that is, a simple "yes or no" answer — to a single 
question of relatively little inherent interest — is the null 
hynothesis to be rejected? 

The logic of statistical significance testing is at first 
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compelling, for it is based on the perfectly reasonable 
assumption that the larger two random sa^r : are, the closer 
should be their means an any measure of iru t, provided that 
the samples are from the same populativ However, the 

mathematical dependence of statistical sig -e upon sample 

size can make even negligible research results appear 
"important." Carver (1978) observed that "a mean difference that 
is small and not significant from a research standpoint can be 
statistically significant just becuse enough subjects were used 

in the experiment to make the result statistically rare under the 
null hypothesis" (p. 388). 

The typical null hypothesis, which postulates the absence of 
"variance explained," is usually of little inherent interest. 
Furthermore, rejecting a null hypothesis is generally done on the 
basis of criteria — for example, the 3% significance level — which, 
however reasonable they may be, are nonetheless arbitrary. Too 
much light focused on the "significance" of a null hypothesis can 
leave the most meaningful implications of an experiment in total 
darkness. Lykken (1968) warned that "[Finding statistical 
significance] is never a sufficient condition for concluding that 
a theory has been corroborated, that a useful empirical fact has 
been established with reasonable conf idence--o r that an 
experimental report ought to be published" (p. ). 

Another problem confronted in statistical significance 
testing is that, as Cotton (1967, p. 57) pointed out, the null 
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hypothesis in social research is never wrong: rarely, if ever, 
will two variables share a correlation coefficient exactly equal 
to zero. Cotton argued that "accepting [a null hypothesis! 
merely expresses a belief that the averge difference is near 
zero" (p. 57) f but as alredy noted, evfcn average differences that 
are "near zero" can be forced to assume unwarranted significance 
when samples are large enough. Furthermore, apart from the fact 
that it usually cannot be wrong, the null hypothesis in any given 
experiment is but one of an infinite number of possible research 
hypotheses, and rarely is it the most illuminating one. 

The most interesting research results are those which, 
however significant statistically, can be generalized from a 
sample to a larger population. Small relationships that are 
consistent over samples are of potentially greater theoretical 
interest than are pronounced relationships that can be obtained 
for only one or two samples. Of course, the ideal research 
result would be large relationships that are consistent over 
samples. However, Carver (3978) has argued that statistical 
significance is not an index of reproducibility: statistical 
significance at the level P does not necessarily imply a 
probability of (1 - P) that another researcher following the same 
procedures will obtain the same results. The confounding of 
statistical significance and reproducibility, argued Carver- is 
one of three major prevailing misconceptions about statistical 
significance testing, the other two being that a statistical 
significance level represents the probability that results were 
obtained by chance, and that it represents the probability that 
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the null hypothesis is true. 

Obviously, the only genuine way to establish the 
replicability of research results is actually to repreat the 
study on as many samples as possible. Unfortunately, this is 
rarely practical. However, the researcher can still obtain an 
estimate of the stability of his results across samples by 
employing so-called invariance or cros s-v al idat 4 on procedures. 
These procedures are the subject of the discTision that follows. 

The logic of invar iance analysis was summar ized by Fish 
(1986) as follows: 

[Invariance procedures] attempt to determine how stable 
the statistical results are likely to be across 
different samples. In the typical invariance procedure 
an analysis is performed separately on each of two 
subgroups into which the study sample has been divided, 
and the results are compared. When the results of an 
analysis are not comparable — i.e. , not invariant- 
serious doubts about the general izabil ity of the results 
are in order, (pp. 65-66) 

Apart from its value to theory building, a successful 
invariance analysis will create confidence that analytic results 
can be employed for practical ends. "Double cross-vav lidation," 
argue Ker linger and Pedhazur (1973), "is strongly recommended as 
the most rigorous approach to the validation of results from 
regression analysis in a predictive framework" (p. 284) . 
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Invariance procedures can be employed profitbly with any 
parametric procedure. The details of invariance analysis vary 
according to the analytic technique employed, and because 
invariance analysis is still a relatively young technique, there 
is ample scope for imaginitive applications. The remainder of 
this paper will focus on standard cross-validation procedures 
that can be employed with multiple regression and with its 
multivariate generalization, canonical correlation analysis. A 
concrete example will be discussed. 



Invariance procedures for multiple regression 

Recall that when there is one dependent variable y and two 
ormore independent variables x(i) , multiple regression analysis 
computes for each case (person) a composite score y' which is 
equal to a linear combination of that case's values on the 
independent variables, as follows: 
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For the entire study sample, the squared correlation coefficient 
between the composite scores and the actual values of the 
dependent variable is a measure of effect size—that is, the 
proportion of variance of the dependent variable that is shared 
with the independent variables. 

The invariance procedure for multiple regression consists of 
the f ol lowing steps: 
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1) The original sample is randomly divided into two 
invariance groups. Ideally, these two groups are of unequal size 
so as to obviate the objection that a satisfactory measure of 
invariance is dependent upon a particular sample size. 

2) Within each of the two groups, all variables are 
separately standardized into z-scores and independent regression 
equations are computed. For each case, an invariance composite 
5COre (y'dfl) for cases in group l f y'(2,2) for cases in group 
2 — the meaning of the double subscript will become clear shortly) 

is computed from the appropriate equation. 
Group 1: Group 2: 

j j 

3) We now proceed to establish the invariance of the 
multiple regression equation computed for invariance group 1. We 
have alredy computed a set of composite scores y'(l,l) for the 
cases in this group. We now compute a second set of composite 
scores for each case, y'(7.2), using the beta-weights computed 
for invariance group 2. Chis is the key step of the invariance 
procedure . 



p 



_ (4) 
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The subscript of this new composite score refers to the fact that 
group 1 data were applied to group 2 regression coefficients. 
Throughout this paper, the subscript ij appearing below any 



composite score or correlation coefficient means that group i 
data were applied to group j weights. (Recall that earlier, 
y*(l,l) referred to composite scores computed from group 1 data 
dubstituted into the group 1 regression equation.) However, when 
the subscript ij appears below ^ or x, it refers to group i, 
independent variable number j. 

4) For group 1 we may now compute two multiple regression 
coefficients: the group's own "genuine" coefficient R(l,l) 
between the set of y and the set of y"(l,l), and an inv ariance 
corre l ation coefficient R(l,2) between the set of y and the set 
of y'(l,2). R(l,2) cannot exceed R(l,l) because the latter is 
the mathematical optimum for group 1, but ideally the two 
coefficients will be very close. The difference between the 
squares of these two coefficients, R(l,l) - R(l,2) (recall that 
only squared correlation coefficients can be meaningful lv 
compared) is an invariance estimate. The closer this value is to 
zero, the more stable the regression results mav be assumed to 
be across samples. The correlation coefficient between the sets 
of composite scores y'(l,l) and y'(l,2) may also be taken as an 
invariance estimate. 

Naturally the procedures outlined above can be repeated with 
the roles of groups 1 and 2 reversed— that is, group 2 data can 
be substituted into the group 1 regression equation, and an 
invariance estimate computed for the group 2 regression equation. 
This would complete the invariance procedure known as "double 
cross-validation. " 

.10 
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Some comments are in order here. For one thing, it mav be 
objected that the Drocedure decribed above establishes the 
invariance of the two group regression equations, not of the 
omnibus equation (1) that is presumably of primary interest. 
This objection has some merit, and reflects the status of 
invariance analysis as an imperfect substitute for genuine 
replication. However, both Mosier (1951) and Kerlinger and 
Pedhazur (1973) have argued that when the results of double 

cross-validation are satisfactory the omnibus regression equation 
may be confidently employed for predictive purposes. Presumably 
it may also then be used for theoretical Durposes as well. 

It may also be argued that no specific criteria were offered 
in the above discussion for evaluating the invariance estimates. 
This omission was deliberate, for no such criteria yet exist. As 
mentioned above, invariance analysis is a relatively young 
procedure, and many avenues remain to be explored. Thompson 
(1986), for example, has derived test criteria for invariance 
estimates computed for factor analysis. However, in some 
respects it is illogical to test the statistical significance of 
results that in some senses are meant to replace significance 
testing . 

One other item of useful information that can be derived 
from invariance analysis concerns multiple R. As mentioned above, 
multiple R is a mathematical optimum for a given sample, and as 
such it is a biased estimate that capitalizes on what Mosier 
called the " idiosyncr acies" of the sample. Naturally a more 
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dependable estimate of multiple R for the target population would 
be desirable. Though no precise means yet exist for computing 
this more dependable estimate from invariance data, Mosier 
suggested that the mean of an invariance group's actual squared 
multiple R and the square of its invariance R might be taken as a 
provisional estimate. 



Invariance procedures for canonical correlation analysi s 

Before discussing invariance procedures in canonical 
correlation analysis, it would be useful to recall that canonical 
correlation analysis is a multivariate generalization of multiple 
regression. Indeed, as argued in Thompson (1984), all parametric 
techniques are special cases of canonical correlation. Canonical 
correlation is the appropriate analytic technique when each 
variable set — predictor (independent) and criterion (dependent) 
variables— hastwoormore elements. 

As in multiple regression, canonical correlation analysis 
computes for each case a predictor comoosite value p" equal to a 
linear combination of the independent variables, x(i). 
Analogously, it computes a criterion composite value q" equal to 
a linear combination of the dependent or criterion variables, 
y(i). Naturally, as in multiple regression, the same function 
coefficients are used for all the cases in the sample. 

Predictor composite value: Criterion composite value: 



The correlation coefficient Rc between the set o£ predict* 
composites p' and the set of criterion composites is t] 
canonical correlation coefficient. According to Thompson (1984 
"a squared canonical correlation coefficient indicates tl 
proportion of var iance that the two composites der iv^fl from tl 
two variable sets linearly share" (p. 14). It should be cle< 
that this squared canonical correlation coefficient is the a n al( 
of multiple R squared in regression analysis. 

The two linear equations (equations 5 and 6) which giv< 
respectively, the predictor and criterion composite scores ai 
known together as a canonical function. The linear c^efficienl 
of a canonical function are derived so as to maximize the shar* 
variance between the two composites for any function. It shou] 
be noted that more than one canonical function may be derived i 
an analysis, the number of such functions being equal to th 
number of variables in the smaller of the two variable sets. Eac 
canonical function derived after the first maxiftii^es th 
explained portion of variance not yet accounted for by any of t\ 
previously derived functions. The reader interested in pursuir 
the logic of canonical correlation analysis furt^et shoul 
consult Thompson, 1984. 

The logic of invariance analysis in canonical correlation j 
essentially the same as for multiple regression, discussed abov< 
Once again, the sample is randomly divided into two tnvarianc 
groups of unequal size. Within each of these two groups, th 
values of all variables are conver ted to z-scor ^ £c?rm an 
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independent canonical correlation analyses are conducted, because 
several canonical functions may fee derived, it may be necessary 
to repeat the invar iance proced^ta described below on as ftia n y of 
those futfcti^ns as are considered to be s tatistic^i^y or 
educationally significant. 

Let \}s assume, then, that within each invariance ^Po u p we 
have derived a canonical functioA, as follows: 



Group 1: 

Predictor composite: 



Criterion compos it<^ 



b uyii- all (7, 8) 



Group 2z 



Predicts* composite: 



Criterion composites 



^21 " ^22 1°) 



In the ab<yV6i equations, a(k,i) ^3 b(k,i) represent resp^t^y-ely 
the standardized predictor ar»d the standardized ctit^ion 
function coefficients derived tot invariance group k and v^iable 
i. In sytokois representing composite scores and c^j^nical 
correlation coefficients (that is, the letters p', q" £c) , 

the doubly Subscript ij means that the coefficient in ^ue^tion 
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was derived fiTom group i data using group j coefficients. This 
is the same Aotational format that was used earlier in the 
discussion of invariance for multiple regression. 

We shall now investigate the invar iance of the group 1 
canonical function, always bearing in mind that the same 
procedure shou^<J be applied afterwards to the group 2 function as 
well. Thus within group 1, two more sets of composite scores, 
p'(l,2) and q*{l,2) 9 will be computed using group 1 data but 
group 2 function coefficients, as follows: 

/ ^ (Ht 12) 

* i 1 1 

A "new" canonical 'cor re lation coefficient, Rc(l,2), is computed 
for the two sets of composite scares, the set of p'(l,2) and 
the set of q'(l,2). Because Rc(l,l) is the mathematical optimum 
for group 1, it must be at least as Jatge in absolute value as 
Rc(l,2). The difference between the squares of Rc(l,l) and 
Rc(l, 2) is an invariance estimate £or the group l canonical 
function. As ih the case of multiple regression, this estimate 
will have a "b^gt case" value of zero and a "worst case" value of 
1. 

The reader should note that the invariance procedure for 
canonical correlation analysis is analogous in almost every 
detail to the ^irocedure discussed abo^e for multiple regression; 
only the vocabulary is different. In fact, if the set of 
criterion vajfi^ples in canonical correlation analyiss contains 
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only one member y, then canonical correlation analysis reduces to 
multiple regression. The original dependent variable takes the 
place of the criterion composite, and multiple R takes of place 
of the canonical correlation coefficient. 

An illustrative example 

This section illustrates the computation of an invariance 
estimate for canonical correlation analysis. Table 1 presents 
the hypothetical data set that will be used. This data set is 
small enough so that the reader, if interested, may follow the 
discussion with pencil and paper. Each variable set, predictors 
and criterion variables, contains two variables, and canonical 
correlation analysis will therefore yield two functions. The 
invariance of only the first function will be discussed below; 
the interested reader may wish to apply invariance procedures to 
the second function an an exercise. 

The first step of the invariance procedure is to divide the 
sample into two invariance groups, for convenience, the first 
five cases of the hypothetical data set have been placed into 
group 1, and the second five into group 2, though ideally two 
invariance groups should be randomly assigned ana of unequal 
size. Table 2 presents the values of the variables ater being 
converted to z-score form within each group. 

The next step of the procedure is to compute separate 
canonical correlation analyses for each of the two invariance 
groups. This can be done effectively only with a computer. 

ERIC 1 6 
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A complete canonical correlation analysis yields a considerable 
amount of data; of these data, only the standardized function 
coefficients are immediately relevant to our invariance 
procedure. These coefficients are presented in Table 3. 

Table 4 presents the data that will be used to compute an 
invariance estimate for the first canonical function in group I. 
The first two columns of Table 4 represent respectively, for each 
invariance group, the predictor and criterion composite scores as 
computed from equations 7 - 10. The third and fourth columns 
present the invariance composite scores which, within each group, 
were computed from the other group's equations (equations 11 and 
12 for group 1). The following four equations illustrate how 
these values were computed for case 1, group 1. 
Predictor composite: 
( 0.971) (-0.878) + ( L146)( 1.321) „ 0.661 

Criterion composite: 

( 1.373)( 0.309) + ( 0.751)( 0.288)- 0.641 

Invariance predictor composite: 

( 0.042) (-0.878) + ( 0..989)( 1.321) - I.269 

Invariance criterion composite: 

( 1.174)( 0.309) + (-0.249)( 0.288) - 0.291 

in the above equations, the data came from Table 2, and the 
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coefficients from Table 3. 

We now have all the data necessary to compute the following 
four correlation coefficients: 

1) Rc(l,l): the "actual" canonical correlation coefficient 
for group 1; the correlation coefficient between group 1 
predictor and criterion composite scores (Table 4, columns 1 and 
2) . 

2) Rc(l,2): the invariance correlation coefficient for group 
1; the correlation coefficient between group 1 invariance 
predictorand invariance criterion composite scores. (Table 4, 
columns 3 and 4.) 

3) Rc(2,2): the "actual" canonical correlation coefficient 
for group 2; the* correlation coefficient between group 2 
predictor and criterion composite scores (Table A F columns 1 and 
2) . 

4) Rc(2,l): the invariance correlation coefficient for group 
2; the correlation coefficient between the invariance predictor 
and the invariance criterion composite scores (Table A, columns 3 
and 4). 

Table 5 presents the squares of these four correlation 
coefficients in the following format: 
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Source of function coefficients; 



Group 1 



Group 2 



Source of data* 



Group 1 



Rc(l f l^ 



Rc(l, 2)* 



Group 2 



x 



Rc(2 f 2)* 



Rc(2 f 1) 



The difference between the two entries in the first row is 
an invariance estimate for function l f group l f while the 
difference between the two entries in the second row is an 
invariance estimate for function l f group 2. Considering the 
extremely small size of the data set, these invariance 
estimates — 0,424 and 0.357 for groups 1 and 2 respectively — are 
not too bad. In a real research situation with a much larger 
sample, one would naturally hope for smaller estimates. 

Before closing it is worth pointing out again that 
invariance analysis is still a young and largely untested 
science, and the interpretation if invariance results is often a 
matter of the researcher's judgment. The reader should remember 
that invariance analysis has to do with the replicabi li ty, not 
the interpretaion, of study results. Large effect sizes but poor 
invariance results will generally indicate that the variables 
included in the analysis do have a significant effect on the 
behavior of the study sample but that this effect cannot 
necessarily be generalized to the larger population. 
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Table 1 
Hypothetical data set 

Predictor Criterion 

Variables Variables 

A B X Y 

Case ; 

1 0 7 4 7 

2 0 4 2 5 

3 5 3 3 9 

4 2 6 7 2 

5 2 3 0 8 

6 9 9 9 7 

7 5 13 0 

8 18 8 7 

9 2 2 4 0 
10 5 2 5 8 
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Table 2 

Variable values standardized to z-score form 
within invar iance groups 



Predictor 
Variables 

A B 

-0.878 1.321 

-0.878 -0.330 

1.562 -0.881 

0.098 0.771 

0.098 -0.881 

1.470 1,216 

0.192 -0.899 

-1.086 0.952 

-0.767 . -0.635 

0.192 -0.635 



Cr iter ion 
Variables 

X Y 

0.309 0.288 

-0.464 -0.432 

-0.077 1.009 

1.468 -1.514 

-1.236 0.649 

1.236 0.644 

-1.082 -1.089 

0.850 0.644 

-0.696 -1.090 

-0.309 0.892 
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Table 3 

Standardized canonical function coefficients 



Function 1 
A 0.971 
B 1.146 
X 1.373 

Y 0.751 

A 0.042 
B 0.989 
X 1.174 

Y -0.249 



Function 2 

0.724 
-0.39 3 

0.302 

1.189 

1.028 
-0.284 
-0.958 

1.494 
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Table 4 

Actual and invariance composite scores 



Group 1 
1 
2 
3 
4 
5 

Group 2 
6 
7 
8 
9 
10 



Actual comp. score 
Predictor Composite 



0.661 
-1.231 
0.507 
0.977 
-0.914 

1.265 
-0.881 

0.895 
■0.660 
■0.620 



0.641 
-0.961 
0.651 
0.880 
-1.211 

1.29.1 
-0.998 

0.837 
-0.545 
-0.585 



Invariance comp. score 
Predcitor Composite 



1.269 
-0.364 
-0.805 

0.766 
-0.867 

2.819 
-0.844 

0.036 
■1.471 
■0.541 



0.291 
-0.436 
-0.342 

2.100 
-1.613 

2.182 
-2.304 

1.651 
-1.774 

0.245 



Table 5 

Sqaures of actual and invariance canonical 
correlation coefficients 

Source of function coefficients 
Group 1 Group 2 

Source of Group 1 0.952 0.526 

data 

Group 2 0.635 0.992 
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